Overview

Education For Employment (EFE) is the leading nonprofit that trains youth and links them to jobs across the Middle East and North Africa (MENA). This pivotal region is the hardest place on the planet for youth to get their first job – they are three times more likely to be unemployed than older adults.

EFE is interested in the effectiveness of their programs, particularly whether graduates find stable employment. We have data on about 7,000 participants in almost 500 program cohorts spread across 8 countries. Participants bring diverse skills, interests, and backgrounds. Programs employ a variety of training models and placement policies. How well are different programs working, and for whom?

Data

EFE has an Salesforce database that houses all information about the organization’s programs, participants, and job placement and retention outcomes. The datasets used in this project include:

  • Participant demographics (program applications): Contacts.csv
  • Pre- and post- program participation surveys (job skills, confidence,…): PrePost Surveys.csv
  • Employment information at 3-month intervals for up to 2 years: Employment Status Checks Blind.csv

Data Pre-Processing

Initially, the data exports from Salesforce were pre-processed in the following ways:

  • To protect PII, Contact ID, Program Name, Company Name, and Class Name were all anonymized by replacing the actual IDs/Names with new, numerical IDs. Lookup tables were sent to EFE so that they can bring the actual values back in if company, program, or class name are significant in the analysis.
  • The applications dataset sometimes contained more than one application per person. Information from previous applications was deemed to be not as important, so only the most recent application was retained, and a new column containing the number of applications each person submitted was added. The application data was then joined to the contact data to create a more complete profile for each program participant.
  • Any individuals who were not present in the Employment Status Check dataset were removed from the contact and pre/post survey datasets.

The preprocessing script can be viewed on GitHub so that the deidentification steps can be reproduced with new data exports.

Feature Engineering

Job Placement

The contact dataset contains 8 columns that relate to when each participant obtained employment. These can be collapsed into a single column that gives the time it took for the participant to get placed, or that they were not placed or could not be reached. Below, the new composite column is on the right, and the original job placement columns can be removed.

Job Retention

Job retention at 6 months is the initial outcome variable in the analysis. Therefore, if the participant can not be reached at 6 months after job placement, they are filtered out. Participants that graduated less than 6 months before the data was pulled, and participants that got a job more than 30 days before graduating are also filtered out. After these steps, of the original 7124 participants in the data, only 2652 remain in the dataset that will be included in the analysis.

Overall, of the 2652 participants there is data for, 47.6% had retained employment 6 months after being placed in a job.

Employment Status Check Surveys

The analysis is initially interested in retention at 6 months; therefore the employment status check data will be filtered to contain only the 6 month surveys, and those survey responses can be joined to the participant contact information retaining a 1:1 relationship.

Pre and Post Training Surveys

The pre and post training surveys contain questions around confidence and self-efficacy that EFE is interested in looking into. There are five different questions relating to confidence, with answers on a “not at all confident” to “very confident” scale. These answers are turned into numbers so that a composite index can be created, and so that changes in confidence after participants have been through the training programs can be more easily measured.

793 of 2652 participants that remain in the filtered dataset have pre/post survey data. A sample of the confidence and self-efficacy change scores calculated are show below. These are each then joined to the contacts dataset.

Exploratory Data Analysis

6 Month Retention Rates by Gender

6 Month Retention Rates by Country

6 Month Retention Rates by Time of Placement

6 Month Retention Rates by Position Type

Confidence and Self-Efficacy Scores

The plots below utilize the composite confidence and self-efficacy scores to show overall changes between pre and post surveys.These plots include participants that had pre and post survey data, not only the ones that have pre and post survey data and 6 month job retention data.

Modeling

A series of models will be tested to determine whether there are features that are important in whether or not participants retained the job the were placed in after 6 months.